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Abstract 

Graphical models are frequently used to represent topological structures of various 
complex networks. Current criteria to assess different models of a network mainly 
rely on how close a model matches the network in terms of topological character- 
istics. Typical topological metrics are clustering coefficient, distance distribution, 
the largest eigenvalue of the adjacency matrix, and the gap between the first and 
the second largest eigenvalues, which are widely used to evaluate and compare 
different models of a network. In this paper, we show that evaluating complex 
network models based on the current topological metrics can be quite misleading. 
Taking several models of the AS-level Internet as examples, we show that although 
a model seems to be good to describe the Internet in terms of the aforementioned 
topological characteristics, it is far from being realistic to represent the real Inter- 
net in performances such as robustness in resisting intentional attacks and traffic 
load distributions. We further show that it is not useful to assess network models 
by examining some topological characteristics such as clustering coefficient and 
distance distribution, if robustness of the Internet against random node removals 
is the only concern. Our findings shed new lights on how to reasonably evaluate 
different models of a network, not only the Internet but also other types of complex 
networks. 
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1. Introduction 



The structural properties of a complex network are typically very com- 
plicated, due to the large size and intrinsic interconnection patterns of the 
network. To better understand the structure of a complex network and to 
analyze its dynamical behavior, it is almost always necessary to construct a 
simplified abstract model that can well reserve the most fundamental struc- 
tural characteristics of the network. 

Graphs, consisting of nodes with links connecting among them in some 
form, have been widely used to model and analyze the structures and dy- 
namical behaviors of various complex networks [l| . Representative examples 
include the following. For social networks, Saban et al. 0] proposed a growth 
network model to represent the bilateral investment treaties (BIT) network; 
Kitsak et al. j3| proposed a scale-free model to describe the business firm 
network; Vieira et al. j3| investigated the sexual transmission of HIV within 
a population based on a small-world network model; Yang et al. [5j studied 
the spreading scheme of viral marketing based on a network model. For bi- 
ological networks, Nicolau and Schoenauer introduced a network model 
to reproduce some statistical measurements of the gene regulatory network; 
Nacher and Araki |7|] suggested an evolutionary model to rebuild the degree 
distribution of the ncRNA-protein interaction network; Sneppen et al. [8j 
presented a simplified model to understand the large-scale regulatory net- 
works; Ponten et al. 0] examined the relationship between structural and 
functional connectivity on the basis of the EEG neural mass model. For 
communication and transportation networks, Boas et al. 10|] developed a 
modified geographical model to discuss worldwide highway networks; Wang 
and Loguinov [ll| derived a wealth-based Internet model to study the AS- 
level Internet topology. Along this line of research from the graph-theoretic 
approach, many other types of examples can be easily given. 

Once a model is constructed to describe a network, it immediately needs 
to be evaluated to see if it is "good" to represent the network and, moreover, 
if it is "better" than other existing models designed for the same network. 

At present, researchers mainly rely on topological characteristics of a net- 
work to do such modeling, verification and comparison. A commonly adopted 
approach within the network science community is to consider a model to be 
"good" for a network if it can reproduce some basic topological characteris- 
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tics of that network. And, further, this model is considered to be superior 
to the others if it can better match the network in terms of these topo- 
logical characteristics. In a word, topological characteristics are the litmus 
for testing network models today. Specifically, some well-studied topological 
characteristics such as node degree distribution, clustering coefficient, dis- 
tance distribution, and spectrum of the adjacency matrix, are widely used to 
validate a model for a network or to evaluate and compare different models 
of the network. For instance, Nacher and Araki [?J used degree distribution 
to evaluate their pro pos ed model for a ncRNA-protein interaction network. 
Wang and Loguinov [11] asserted that their proposed wealth-based Internet 
model is better than other exisitng models because it can better capture the 
clustering coefficient and the distance distribution of the AS-level Internet 



network using a set of real data. Toivonen et al. [12j utilized degree distri- 
bution, clustering coefficient, and community structure to compare different 
models of friendship and email networks. 

A more careful examination of the modeling issue reveals that the per- 
formance of some specific functions of a network is more important than its 
topological characteristics, since a network (e.g., the Internet) is designed 
or formed for certain intended functioning and tasks, unless the latter truly 
determines the former. 

In this paper, first, taking models of the AS-level Internet topology as the 
underlying test-bed, we show that the current approach of using purely the 
topological characteristics to evaluate different models may be misleading in 
model selection. Specifically, we show that different existing Internet models 
can have little difference in resisting random removals although they are very 
different in major topological metrics, namely clustering coefficient, distance 
distribution, the largest eigenvalue of the adjacency matrix, and the gap 
between the first and the second largest eigenvalues. As a result, if the 
robustness of the Internet against random removals is the main concern, then 
Internet model should not be assessed based only on such model topological 
characteristics. 

In this paper, furthermore, we show that some models that can closely 
match the Internet in terms of the aforementioned topological metrics can 
be very unsuitable to use for investigating the robustness of of the Internet 
in resisting intentional attacks and traffic load distribution. As a result, such 
criteria for Internet modeling seem to be misleading in selecting good models 
for describing the real Internet, at least at the AS level. 

The rest of the paper is organized as follows. Section 2 provides some 
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background and preliminaries on network models and their topological met- 
rics. Section 3 compares some basic topological characteristics and Section 
4 compares robustness against random failures and intentional attacks, and 
data traffic performance of the Internet, all by simulations. Section 5 briefly 
concludes the present investigation. 



2. Network models and their topological metrics 

For the AS-level Internet, since the first observation by Faloutsos et al. 
13j , several power-law models, such as the BA 14|, EBA 15] ; Fitness 16 



GLP pj], HOT [lil, PFP [19j, MLW 0, and WIT 11] models, have been 
proposed to describe the Internet topology, despite the fact that many of 
them were not intended for the Internet. In order to evaluate and compare 
these models against the real Internet, many basic topological characteristics 
have been examined and discussed, including the following: 

Clustering coefficient — it is defined to measure how close the neighbors 
of a node are interconnected, popularly known as the probability of two 
friends of a person being friends themselves in a social network. It is an 
important characteristic to the robustness performance resisting removals of 
nodes-links, and even to routing algorithms in computer networks since a 
node with higher clustering coefficient generally means higher path diversity 
of the node. 

Distance distribution — it is to measure the probability that a randomly 
selected pair of nodes are separated by a pre-designated distance. As a global 
topology characteristic, it plays a vital role in many Internet applications, 
such as routing and resisting virus spreading. 

The first largest eigenvalue and the gap between the first and the second 
largest eigenvalues of the adjacency matrix — eigenvalues of the adjacency 
matrix of a network represent another global characteristic of the network. 
Particularly, the first largest eigenvalue of the network adjacency matrix and 
the gap between the first and the second largest eigenvalues are very im- 
portant because the former is key to the network robustness on removals of 
nodes-links and the latter is closely related to the maximum traffic through- 
put of the network. Here, the adjacent matrix, {aij} NxN , is defined by 
setting djj to be 1 if a pair of nodes i and j is connected, and 0, otherwise. 

These three basic topological characteristics have been frequently used 
to evaluate newly proposed models for the Internet. For example, Wang 
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and Loguinov 2JJ compared the wealth-based evolution model with the BA 



model, the generalized linear preferential model [22], and the HOT model 



181 ]. by examining whether they can reproduce the average clustering co- 
efficient and average distance characteristics of the real Internet topology 
They also used the dynamical behaviors of average clustering coefficient, av- 
erage distance, and the second smallest nonzero eigenvalue of the normalized 



Laplacian matrix, to compare different models [11]. Bu and Towsley [17 
argued that their proposed model is better than the others by evaluating 
the degree of resemblance to the Internet in terms of power-law exponent, 
average clustering coefficient and average distance. 

On the other hand, in the computer networking community two usually 
concerned and widely studied issues are the following: 

Robustness in resisting random failures and intentional attacks — On the 
Internet, events such as equipment failures, power lost, traffic overload, and 
distributed DoS attacks, occur frequently. Such incidents are expected to 
have little effect on the effective operation of the entire network, namely the 
Internet should be robust against them. 

Traffic load distribution — In Internet data traffic engineering, the traffic 
load distribution pattern of the Internet is very important because it can be 
used to measure the potential traffic on nodes-links and potential congestion 
points in the Internet. 

By taking all the aforementioned network metrics and concerned issues 
into consideration, the objective of this paper is to answer the following ques- 
tion: for a "good" Internet model that "closely" matches the real Internet in 
terms of the three key topological characteristics mentioned above, is it also 
"good" to the Internet in capturing the robustness of the network against 
random failures and intentional attacks and in reproducing the Internet traf- 
fic load distribution pattern? 

To address this question, the familiar BA, EBA, Fitness, and MLW 
models are used below, because they can be precisely formulated and pro- 
grammed, to investigate the AS-level Internet topology constructed based on 



the daily data collected by UCLA [23| on 15 May 2005. 



Noticing the argument 24| that the degree distribution of the AS-level 
Internet is not a power-law but a Weibull distribution or something else, we 
plot Figure 1 here for verification, to show the cumulative degree distributions 
of the UCLA data collected on 15 May from 2004 to 2010. During this period 
of a total of six years, these cumulative degree distributions of real Internet 
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data turned out to be very similar to each other and they all look like power- 
law (though not exactly), albeit the Internet size has increased dramatically 
in the six years. For this reason, some traditional Internet models, such as 



random graph 25], Tiers and Transit-Stub models [22J], are excluded from 
our comparisons below. Meanwhile, previous observations have shown [26[ 
that the three characteristics, namely average degree, degree distribution, 
and joint degree distribution, are key to reproduce an Internet-like topology. 
Therefore, the models considered here, namely the BA, EBA, Fitness, and 
MLW models, will appl y th e same set of values of these three characteristics, 



whenever possible (see |20j for more details in performing such simulations). 

It should be remarked that our emphasis here is not to claim which model 
is the best one to represent the Internet or optimize a model to best fit 
the Internet topology, although this comparison will be made from time to 
time, but rather to demonstrate that topological metrics should not be used 
as criteria for modeling the Internet. In other words, the concern is the 
performance especially the robustness of the model versus the real Internet, 
therefore it is often not necessary to tune the model parameters to best fit 
the snapshots of the Internet topology data in simulations. 

3. Comparison of basic topological characteristics 

The parameter values of some basic topological metrics obtained from our 
extensive simulations, such as the network size (number of nodes), power- 
law exponent, assortativity coefficient, average clustering coefficient, average 
distance, and the largest eigenvalue of the adjacency matrix, are summarized 
in Table I. 

It can be observed from Table I that the MLW and EBA models are closer 
to the Internet in terms of average clustering coefficient, average distance, 
and the largest eigenvalue. Clearly, the MLW and EBA models are better 
than the BA and Fitness models if models are compared by these topological 
characteristics. 

Figure 2 shows the relationship between the clustering coefficient and 
the node degree k for the Internet and all models studied. It can be seen 
that high-degree nodes of the Internet have lower clustering coefficients while 
low-degree nodes have higher clustering coefficients, which is consistent with 



the observations [27| that the core is loosely connected and the structure 



is clearly hierarchical in the Internet. It can also be observed from Figure 
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Tabic 1: Values of topological parameters for the Internet and the network models. N is 
the number of nodes, 7 is the power-law exponent, r is the assortativity coefficient, C is 
the average clustering coefficient, d is the average distance between nodes, and A is the 



largest nonzero eigenvalue of the adjacency matrix. 








Internet 


BA 


EBA 


Fitness 


MLW 


N 


21999 


21999 


21999 


21999 


21999 


7 


2.18 


3.0 


2.69 


2.45 


2.36 


r 


-0.18 


-0.02 


0.02 


-0.11 


0.03 


C 


0.46 


0.003 


0.01 


0.01 


0.24 


d 


3.49 


4.14 


3.49 


3.71 


3.45 


A 


141.12 


27.82 


62.83 


39.16 


111.87 



2 that the clustering coefficient of the MLW model is closer to that of the 
Internet as compared to the other models. 

Figure 3 displays the distance distributions of the Internet and the mod- 
els. It can be seen that the BA and Fitness models have a Poisson-like 
distance distribution, with a peak around a certain distance do and decaying 
exponentially when distance d is far away from do. Clearly, the MLW and 
EBA models are better than the BA and Fitness models in capturing the 
characteristic of distance distribution of the Internet. 

Figure 4 depicts the first and second largest eigenvalues of the adjacency 
matrix of the Internet and the models. For the Internet, the first largest 
eigenvalue is quite large and there is a big gap between the first and the second 
largest eigenvalues. It can be observed that the first largest eigenvalue and 
the gap between the first and the second largest eigenvalues are both bigger 
in the MLW and EBA models, but they are smaller in the BA and Fitness 
models. Clearly, if only the first largest eigenvalue is concerned, the MLW 
and EBA models are better than the BA and Fitness models. Furthermore, 
in evaluating both the first largest eigenvalue and the gap between the first 
and the second largest eigenvalues, the same can be concluded. 

In summary, the MLW is the best choice among the studied models to fit 
the Internet topology, if the models are evaluated by their topological charac- 
teristics such as the average clustering coefficient, average distance, clustering 
coefficient distribution, distance distribution, the first largest eigenvalue, and 
the gap between the first and the second largest eigenvalues. 
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4. Comparison of performances in robustness and data traffic 



The robustness of a network against attacks and failures can be studied 
by discussing Sf, the size of the largest connected component after a fraction 
of nodes, /, in the network were randomly or intentionally removed from the 
original network Sq. Clearly, the ratio Sf/So measures the capability of the 
network regarding, after the / portion of nodes have been randomly or in- 
tentionally removed, how many nodes remain functioning in communicating 
with each other. 

Figure 5 shows a comparison of robustness resisting random removals of 
nodes. One can observe that all models have little difference. However, as 
discussed above, all these models are very different in their topological charac- 
teristics, namely clustering coefficient, distance distribution, the first largest 
eigenvalue, and the gap between the first and the second largest eigenvalues. 
As a conclusion, if the resistance ability of the Internet against random re- 
movals is the only concerned, then a good model of the Internet should not 
assessed by topological metrics. 

One can also observe that the BA and Fitness models have different 
power-law exponents but they behave similarly in resisting random removals. 
Therefore, concerning the robustness of the Internet against random re- 
movals, it is not necessary to require a model to be able to exactly reproduce 
the value of the power-law exponent of the real Internet. Even the simplest 
toy BA model can roughly reflect the Internet's robustness against random 
removals. This also shows that the so-called "robust yet fragile" property 
of the Internet |28(, or its BA model, does not essentially depend on the 
power-law distribution of its topology. 

Figure 6 compares the robustness of resisting intentional attacks. Here, 
as usual, intentional attacks mean that nodes are removed one after another 
following the decreasing order of the node degrees. It can be observed from 
the figure that again the MLW model is the best while the EBA model is 
the worst in reflecting the Internet's robustness in resisting intentional at- 
tacks. However, both the MLW and EBA models are better than the BA 
and Fitness models in terms of reproducing the average clustering coefficient 
and average distance. Therefore, if Internet models are evaluated based on 
the average clustering coefficient and average distance, as did in [17|, then 



the "best" or a "better" model so selected will be truly misleading. Note 
also that the EBA model is better than the Fitness model in reproducing the 
Internet's topological characteristics, including the clustering coefficient, dis- 
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tance distribution, the first largest eigenvalue, and the gap between the first 
and the second largest eigenvalues. However, the Fitness models is closer to 
the Internet than the EBA model in matching the robustness against inten- 
tional attacks. Thus, one way or another, if these models are evaluated based 
on topological characteristics, then the results will be misunderstanding. 

Next, to investigate the traffic load distribution, it is natural to study 
T(r), the ratio of the traffic load of the first r largest nodes over the total 
traffic load of the whole network. Here, it is assumed that a data packet is 
sent from node % to j, for every possible pair of nodes (i, j). For simplicity, 
we do not take into account the time delay of data transmission at nodes 
and links, and adopt the Open-Shortest-Path-First (OSPF) routing protocol 
to transmit data packets. Thus, the traffic load of a node is defined as the 
total number of packets passing through it when all pairs of nodes send and 
receive one packet between them. 

Figure 7 shows the traffic load distribution of the Internet and the mod- 
els. It can be seen that the traffic load distribution of the Internet is quite 
heterogeneous: a small fraction of the first largest nodes occupy most traffic 
load of the network, while a large number of low-degree nodes occupy only a 
small portion of the total traffic. Compared to the BA and Fitness models, 
the MLW and EBA models significantly underestimate the heterogeneity of 
the traffic load distribution of the Internet. Again, a "better" model deter- 
mined by using the average clustering coefficient and average distance or by 
using the clustering coefficient, distance distribution, the first largest eigen- 
value, and the gap between the first and the second largest eigenvalues, can 
be misunderstanding too — A model that is closer to the Internet in topolog- 
ical characteristics can be very bad in reflecting the traffic load distribution, 
or vice versa. 

5. Conclusions 

Several comparable network models of the AS-level Internet have been 
investigated, analyzed and compared, in terms of their topological character- 
istics such as clustering coefficient, distance distribution, and the first largest 
eigenvalue as well as the gap between the first and the second largest eigen- 
values of the adjacency matrix. It reveals that that a model that are better 
than the others in matching the topological characteristics of the Internet 
may actually be worst in representing some critical performances and be- 
haviors such as the robustness against random or intentional attacks, and 
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traffic load distribution, of the Internet. The conclusion is therefore that 
evaluating complex network models based on current topological metrics can 
be misleading, at least in the scenario of the AS-level Internet. Our findings 
may shed new lights on realistic modeling of more general complex networks. 
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Figure 2: Comparison of clustering coefficient. 
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Figure 4: Comparison of the first and second largest eigenvalues of adjacent matrix. 
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Figure 7: Comparison of traffic load distribution. 
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